fix[next-dace]: Fix Memory Layout for CPU by philip-paul-mueller · Pull Request #2459 · GridTools/gt4py

philip-paul-mueller · 2026-01-28T08:01:47Z

Before the optimizer was assuming that the memory allocation for GPU and CPU was different, i.e. that in CPU the stride 1 dimension is associated with the vertical dimension while for GPU it is associated with the horizontal dimension. However, this is wrong and in both cases stride 1 is associated with the horizontal dimension.
This PR fixes this and now the loop order and the memory layout for transients assumes that stride 1 is associated to the horizontal dimension.

Note that the current implementation assumes that there is only one horizontal dimension.

havogt · 2026-01-28T08:57:53Z

src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py

-        unit_strides_kind = (
-            gtx_common.DimensionKind.HORIZONTAL if gpu else gtx_common.DimensionKind.VERTICAL
-        )
+        unit_strides_kind = gtx_common.DimensionKind.HORIZONTAL


Why does that make sense? You cannot assume anything...

Or is that just for transients? Then I would change the comment assume -> set or something.

There are two things here, first the name is bad and should be probably something else.
However the value selection is correct, one could even argue that it is probably the only one that make sense.
The reason for this is that the maximal numbers of blocks is different for each direction, because (for ICON) size(horizontal) >>> size(vertical) one would get launch errors otherwise.

…01-28

…escription. If the leading kind is not known then it will not reorder strides nor the iteration order. However, for cetain reasons (launch errors) we have to set one for GPU in that case.

edopao

LGTM, only one refactoring suggestion.

src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py

philip-paul-mueller · 2026-01-29T13:51:45Z

There is some variability but I think there is nothing pathological going on.

edopao

LGTM

philip-paul-mueller added 2 commits January 28, 2026 08:50

Updated the CPU memory order.

6f1c95b

Made some additional notes.

2027ad6

philip-paul-mueller requested review from edopao, havogt and iomaganaris January 28, 2026 08:01

philip-paul-mueller marked this pull request as ready for review January 28, 2026 08:02

havogt reviewed Jan 28, 2026

View reviewed changes

edopao changed the title ~~fix[dace-next]: Fix Memory Layout for CPU~~ fix[next-dace]: Fix Memory Layout for CPU Jan 28, 2026

philip-paul-mueller added 2 commits January 28, 2026 15:12

Merge remote-tracking branch 'gt4py/main' into fixed_cpu_order__2026-…

d1d378b

…01-28

Updated the description and naming a bit.

6179a6a

philip-paul-mueller requested a review from havogt January 28, 2026 14:45

Changed the selection of the leading kind and also clarified on the d…

88ac3ba

…escription. If the leading kind is not known then it will not reorder strides nor the iteration order. However, for cetain reasons (launch errors) we have to set one for GPU in that case.

edopao reviewed Jan 29, 2026

View reviewed changes

src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py Show resolved Hide resolved

Added a compatibility layer for ICON4Py.

f390cff

philip-paul-mueller mentioned this pull request Jan 29, 2026

DO NOT MERGE: Test new CPU strides C2SM/icon4py#1017

Draft

Updated the warning.

080c669

philip-paul-mueller requested a review from edopao January 29, 2026 09:53

edopao mentioned this pull request Jan 29, 2026

feat[next]: Muphys staging #2462

Closed

Removed the compatibility hack.

ad3a048

Correction.

458f998

edopao approved these changes Jan 30, 2026

View reviewed changes

Merge branch 'main' into fixed_cpu_order__2026-01-28

fd68136

philip-paul-mueller mentioned this pull request Feb 3, 2026

perf[next-dace]: RemoveScalarCopies and FuseHorizontalConditionBlocks transformations #2469

Merged

2 tasks

Merge branch 'main' into fixed_cpu_order__2026-01-28

1e8e3bd

philip-paul-mueller merged commit ce0e807 into GridTools:main Feb 4, 2026
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix[next-dace]: Fix Memory Layout for CPU#2459

fix[next-dace]: Fix Memory Layout for CPU#2459
philip-paul-mueller merged 11 commits intoGridTools:mainfrom
philip-paul-mueller:fixed_cpu_order__2026-01-28

philip-paul-mueller commented Jan 28, 2026 •

edited

Loading

Uh oh!

havogt Jan 28, 2026

Uh oh!

havogt Jan 28, 2026

Uh oh!

philip-paul-mueller Jan 28, 2026

Uh oh!

edopao left a comment

Uh oh!

Uh oh!

philip-paul-mueller commented Jan 29, 2026

Uh oh!

edopao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

philip-paul-mueller commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

havogt Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

havogt Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

edopao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

philip-paul-mueller commented Jan 29, 2026

Uh oh!

edopao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

philip-paul-mueller commented Jan 28, 2026 •

edited

Loading